The World Happiness Report is downloaded from www.kaggle.com. Happiness scored according to 6 factors - economic production, social support, life expectancy, freedom, absence of corruption, and generosity.
The happiness survey asked the Cantril ladder question that requests participants to think of a ladder with the best possible life for them being a 10 and the worst possible life being a 0 and to rate their own current lives on that scale. I’m interested in: (1) the overall differences between Western Europe and Middle East and Northern Africa in 2015 (see table 1 and table 2). (2) the relationship between economy and happiness score (see scatterplot)
| Variable | Mean | SD | min | max | % Missing |
|---|---|---|---|---|---|
| Dystopia Residual | 2.15 | 0.38 | 1.26 | 2.70 | 0 |
| Economy (GDP per Capita) | 1.30 | 0.10 | 1.15 | 1.56 | 0 |
| Family | 1.25 | 0.14 | 0.89 | 1.40 | 0 |
| Freedom | 0.55 | 0.15 | 0.08 | 0.67 | 0 |
| Generosity | 0.30 | 0.13 | 0.00 | 0.52 | 0 |
| Happiness Rank | 29.52 | 29.27 | 1.00 | 102.00 | 0 |
| Happiness Score | 6.69 | 0.82 | 4.86 | 7.59 | 0 |
| Health (Life Expectancy) | 0.91 | 0.03 | 0.87 | 0.96 | 0 |
| Standard Error | 0.04 | 0.01 | 0.02 | 0.06 | 0 |
| Trust (Government Corruption) | 0.23 | 0.15 | 0.01 | 0.48 | 0 |
| Variable | Mean | SD | min | max | % Missing |
|---|---|---|---|---|---|
| Dystopia Residual | 1.98 | 0.54 | 0.33 | 3.09 | 0 |
| Economy (GDP per Capita) | 1.07 | 0.32 | 0.55 | 1.69 | 0 |
| Family | 0.92 | 0.24 | 0.47 | 1.22 | 0 |
| Freedom | 0.36 | 0.17 | 0.00 | 0.64 | 0 |
| Generosity | 0.19 | 0.11 | 0.06 | 0.47 | 0 |
| Happiness Rank | 77.60 | 43.21 | 11.00 | 156.00 | 0 |
| Happiness Score | 5.41 | 1.10 | 3.01 | 7.28 | 0 |
| Health (Life Expectancy) | 0.71 | 0.11 | 0.40 | 0.91 | 0 |
| Standard Error | 0.05 | 0.01 | 0.03 | 0.08 | 0 |
| Trust (Government Corruption) | 0.18 | 0.13 | 0.05 | 0.52 | 0 |
## We recommend that you use the dev version of ggplot2 with `ggplotly()`
## Install it with: `devtools::install_github('hadley/ggplot2')`
Hypothesis:
H0: Happiness scores between Western Europe and Middle East and Northern Africa have no differences.
H1: Happiness scores between Western Europe and Middle East and Northern Africa are statistically different.
Normality test
##
## Shapiro-Wilk normality test
##
## data: `Happiness Score`[Region == "Western Europe"]
## W = 0.89516, p-value = 0.02823
##
## Shapiro-Wilk normality test
##
## data: `Happiness Score`[Region == "Middle East and Northern Africa"]
## W = 0.97138, p-value = 0.7837
One group is normally distributed, the other is not, so group comparison will use unpaired two-samples Wilcoxon test.
Homogeneity in Variances Check
Do the two populations have the same variances?
##
## F test to compare two variances
##
## data: Happiness Score by Region
## F = 1.7841, num df = 19, denom df = 20, p-value = 0.2077
## alternative hypothesis: true ratio of variances is not equal to 1
## 95 percent confidence interval:
## 0.7187759 4.4760931
## sample estimates:
## ratio of variances
## 1.784056
There’s no significant differences between the variances of the two sets of data (p > .05).
Wilcoxon Test
## statistic p.value method
## 1 69 0.0002477983 Wilcoxon rank sum test with continuity correction
## alternative
## 1 two.sided
Western Europe’s happiness score is significantly different from Middle East and Northern Africa’s score (p < .001).
Normality Test
##
## Shapiro-Wilk normality test
##
## data: happy_2015$`Health (Life Expectancy)`
## W = 0.93521, p-value = 1.344e-06
##
## Shapiro-Wilk normality test
##
## data: happy_2015$`Economy (GDP per Capita)`
## W = 0.96502, p-value = 0.0004967
Both Health (Life Expectancy) and Economy (GDP per Capita) is not normally distributed (p<.001). Data needs to be transformed.
Kendall Correlation
## Call:corr.test(x = ., method = "kendall")
## Correlation matrix
## Happiness Score log_health log_economy
## Happiness Score 1.00 0.55 0.59
## log_health 0.55 1.00 0.65
## log_economy 0.59 0.65 1.00
## Sample Size
## [1] 158
## Probability values (Entries above the diagonal are adjusted for multiple tests.)
## Happiness Score log_health log_economy
## Happiness Score 0 0 0
## log_health 0 0 0
## log_economy 0 0 0
##
## To see confidence intervals of the correlations, print with the short=FALSE option
##
## Confidence intervals based upon normal theory. To get bootstrapped values, try cor.ci
## raw.lower raw.r raw.upper raw.p lower.adj upper.adj
## HppnS-lg_hl 0.44 0.55 0.65 0 0.44 0.65
## HppnS-lg_cn 0.48 0.59 0.69 0 0.46 0.70
## lg_hl-lg_cn 0.55 0.65 0.73 0 0.53 0.75
##
## Kendall's rank correlation tau
##
## data: cor_data$`Happiness Score` and cor_data$log_health
## z = 10.334, p-value < 2.2e-16
## alternative hypothesis: true tau is not equal to 0
## sample estimates:
## tau
## 0.5541848
##
## Kendall's rank correlation tau
##
## data: cor_data$`Happiness Score` and cor_data$log_economy
## z = 11.057, p-value < 2.2e-16
## alternative hypothesis: true tau is not equal to 0
## sample estimates:
## tau
## 0.592945
Kendall correlation indicated a moderately strong significant positive relationship between Happiness Score and Health(life expectancy) (r = .55 [95% CI: .44; .65], N = 158, p < .001), and similar significant positive relationship between Happiness Score and Economy (GDP per Captica) (r = .59 [95% CI: .48; .69] N = 158, p < .001).
A simple linear regression was run to investigate the degree to which Health (Life Expectancy) predicts Happiness Score.
| Dependent variable: | |
Happiness Score
|
|
Health (Life Expectancy)
|
3.356*** |
| (0.256) | |
| Constant | 3.261*** |
| (0.173) | |
| Observations | 158 |
| R2 | 0.524 |
| Adjusted R2 | 0.521 |
| Residual Std. Error | 0.792 (df = 156) |
| F Statistic | 172.052*** (df = 1; 156) |
| Note: | p<0.1; p<0.05; p<0.01 |
A significant regression equation was found (F(1,156) = 172.05, p < .001), with an R^2 of 0.52. Participants’ predicted Happiness Score is equal to 3.26 + 3.36 (Health (Life Expectancy)). Participants’ happiness score increased 3.36 for each unit of Health (Life Expectancy).
Residual Diagnostics Plots
Let’s take a look at some plots of residual diagnostics.
A solid horizontal line distinguishes between positive and negative residuals, and roughly checking, they are equally scattered from both sides.
Quantile-Quantile plot seems approaching a straight line (normal distribution), which supports the linear model assumption about the distribution of the residuals.
Normality test on residuals
##
## Shapiro-Wilk normality test
##
## data: resid(reg)
## W = 0.98078, p-value = 0.02677
With a p-value < .05, residuals are not normal distributed.
Description and EDA: (1) two tables displayed some descriptive statistics for two regions, Western Europe and Middle East and Northern Africa. Some differences waited for further analysis. (2) Both two scatter plots showed linear relations between Happiness Score and Economy (GDP per Capital), and Happiness Score and Health (Life Expectancy).
Group comparison and correlation analysis: (1) Western Europe’s happiness score is significantly different from Middle East and Northern Africa’s score (p < .001). (2) There are moderately strong significant positive relationships between Happiness Score and Economy (GDP per Capital), and Happiness Score and Health (Life Expectancy).
Linear regression: Health (Life Expectancy) is a significant predictor on Happiness Score. All the residuals diagnotistic plots seem ok, but based on Shapiro test, residuals are not normally distributed.